Washington,  DC  20375-5320 


Naval  Research  Laboratory 


NRL/MR/5510-06-9023 


A  Search  Relevance  Algorithm 
for  Weather  Effects  Products 


Justin  Nevitt 

Navy  Center  for  Applied  Research  in  Artificial  Intelligence 
Information  Technology  Division 

Don  Brown 

BAE  Systems  Information  Technology 
Herndon,  VA 


December  29,  2006 


Approved  for  public  release;  distribution  is  unlimited. 


REPORT  DOCUMENTATION  PAGE 


Form  Approved 
0MB  No.  0704-0188 


Public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  this  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information,  including 
suggestions  for  reducing  this  burden  to  Department  of  Defense,  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports  (0704-0188),  1215  Jefferson  Davis  Highway, 
Suite  1204,  Arlington,  VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  any  penalty  for  failing  to  comply  with  a  collection  of 
information  if  it  does  not  display  a  currently  valid  0MB  control  number.  PLEASE  DO  NOT  RETURN  YOUR  FORM  TO  THE  ABOVE  ADDRESS. 


1.  REPORT  DATE  (DD-MM-YYYY) 
29-12-2006 


2.  REPORT  TYPE 

Memorandum  Report 


3.  DATES  COVERED  (From  -  To) 
12-2005  to  10-2006 


4.  TITLE  AND  SUBTITLE 


A  Search  Relevance  Algorithm  for  Weather  Effects  Products 


5a.  CONTRACT  NUMBER 


5b.  GRANT  NUMBER 


5c.  PROGRAM  ELEMENT  NUMBER 

0602235N 


6.  AUTHOR(S) 


Justin  Nevitt  and  Don  Brown* 


5d.  PROJECT  NUMBER 


5e.  TASK  NUMBER 


5f.  WORK  UNIT  NUMBER 

55-7188 


7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Naval  Research  Laboratory,  Code  5513 
4555  Overlook  Avenue,  SW 
Washington,  DC  20375-5320 


8.  PERFORMING  ORGANIZATION  REPORT 
NUMBER 


NRL/MR/55 10-06-9023 


9.  SPONSORING  /  MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

Office  of  Naval  Research 
875  North  Randolph  Street 
Arlington,  VA  22203-1995 


10.  SPONSOR  /  MONITOR’S  ACRONYM(S) 

ONR 


11 .  SPONSOR  /  MONITOR’S  REPORT 
NUMBER(S) 


12.  DISTRIBUTION  /  AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  is  unlimited. 


13.  SUPPLEMENTARY  NOTES 

BAE  Systems 

2525  Network  Place,  Herndon,  VA  20171 


14.  ABSTRACT 

This  paper  is  concerned  with  providing  the  user  with  an  efficient  way  to  find  information,  specifically  weather  effects  products  within  a 
Service  Oriented  Architecture  (SOA).  The  work  outlined  in  this  paper  pertains  to  searching  and  ranking  weather  effects  products  from  the  EVIS 
(Environmental  Visualization)  data  provider.  EVIS  is  a  data  provider  to  a  Eederated  Search  engine  in  the  NCES  (Network  Centric  Enterprise 
Service)  ECB  (Early  Capabilities  Baseline).  Several  off-the-shelf  search  solutions  are  examined  and  a  custom  search/relevance  algorithm  is 
discussed.  This  algorithm  is  based  on  the  idea  that  searching  weather  products  is  more  akin  to  a  database  search.  The  paper  concludes  with  a 
look  at  cross-provider  relevance  and  the  complications  that  arise  with  a  larger-scale,  growing  SOA. 


15.  SUBJECT  TERMS 

NetCentric  Search  algorithm 

SOA  Relevance  algorithm 


Eederated  search 


16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION 

18.  NUMBER 

19a.  NAME  OF  RESPONSIBLE  PERSON 

OF  ABSTRACT 

OF  PAGES 

Justin  Nevitt 

a.  REPORT 

b.  ABSTRACT 

c.  THIS  PAGE 

UL 

19 

19b.  TELEPHONE  NUMBER  (include  area 

Unclassified 

Unclassified 

Unclassified 

code) 

(202)  767-3365 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std.  Z39.18 


CONTENTS 


1  Introduction . 1 

2  Motivations . 3 

3  Target  Users . 5 

4  Search  Approach . 6 

5  The  Algorithm . 6 

5. 1  Search  categories . 7 

5. 2  Current  algorithm  weights: . 7 

6  Federated  Search  Applet . 8 

6.1  Simple  Search  Interface: . 9 

6. 2  Advanced  Search  Interface: . 9 

6.3  Search  Results: . 10 

6.4  Products  searchable  via  the  GISA  tool . 11 

7  Cross-Provider  Relevance . 12 

8  Beyond . 13 

REFERENCE: . 15 


TABLE  OF  FIGURES 


Figure  Page 

Figure  1.  Current  NCES  SOA  Data  Providers . 4 

Figure  2.  Basic  Federate  Search  Interface . 9 

Figure  3.  Advanced  Search  Interface . 10 

Figure  4.  Search  Results  Display . 11 

Figure  5.  Geographical  Search  Interface . 12 

Figure  6.  An  Example  Search  Output . 13 


1 


A  Search  Relevance  Algorithm  for  Weather  Effects  Products 
1  Introduction 

We  are  situated  in  an  information  rich  time  [1]  in  which  there  is  an  overabundance  of 
information,  both  useful  and  useless.  A  basic  search  on  the  internet  reveals  hundreds  or 
thousands  of  documents  on  almost  any  subject,  no  matter  how  obscure.  Previously, 
access  to  such  specialized  information  was  limited  to  individuals  working  in  specific 
fields.  Not  too  long  ago,  most  information  storage  was  in  the  form  of  physical  texts 
(books,  magazines,  etc.).  Had  one  wished  to  find  a  scientific  article  on  a  very  specific 
subject  he  would  have  to  physically  go  to  an  academic  library.  Today,  the  same 
information  is  available  24  hours  seven  days  a  week  on  the  internet.  The  technology  to 
store  and  present  all  this  information  was  available  years  ago,  but  was  not  widely  used. 
The  problem  was  an  overabundance  of  information  in  an  unusable,  unsearchable  format. 
For  instance,  in  the  early  days  of  the  internet  most  navigation  was  done  by  hypertext 
referrals  from  one  site  to  another  or  by  word  of  mouth.  Presently,  searching  has  become 
second  nature  to  even  the  most  casual  internet  user.  According  to  Alexa  and  other  page 
ranking  services  the  highest  ranked  web  pages  (estimation  of  most  accessed)  are  often 
search  engines  [4]  [5].  This  suggests  that  people  are  navigating  the  internet  by  searching 
and  not  through  the  traditional  linking  and  referral  system  of  the  original  internet. 

As  time  moves  forward  the  amount  of  information  available  increases  quickly.  Wading 
through  it  to  get  to  the  information  one  wants  can  be  difficult  and  possibly  detrimental  to 
task  performance  [2] .  Search  engine  companies  have  made  fortunes  for  their  owners  with 
the  deceptively  simple  act  of  searching  and  presenting  websites.  This  speaks  volumes  to 
the  importance  of  a  good  search  tool.  Search,  and  more  generally  the  organization  of 
information,  is  what  makes  a  rich  body  of  information  useful. 

The  Department  of  Defense  is  trying  to  keep  several  steps  ahead  of  the  enemy  within  the 
information  technology  arena.  They  often  stress  the  importance  of  having  information  at 
the  fingertips  of  the  warfighter  with  buzzwords  like  “information  superiority”  and 
“information  warfare.”  [3]  The  idea  is  that  in  order  to  win  the  conflicts  of  tomorrow  the 
United  States  must  have  the  highest  quality  information,  presented  in  a  timely  and  easily 
understood  manner.  Again,  it  is  simply  not  enough  to  have  all  the  information  available. 
It  must  be  available  and  easily  consumed  by  the  user.  To  quote  Alberts,  et.  ah: 

Improvements  in  the  ability  to  share  information  will  contribute  to  improvements 
in  the  ability  to  generate  and  maintain  shared  awareness  which  in  turn,  together 
with  the  greatly  enhanced  facilities  to  collaborate  (quality  of  interaction),  will 
contribute  to  improved  synchronization.  Thus,  advances  in  the  information 
domain  that  result  from  an  improved  ability  to  push  the  envelope  in  the  richness, 
reach,  and  interaction  space  will  affect  processes  in  the  cognitive  domain  which  in 
turn  will  be  reflected  in  the  physical  domain  in  the  form  of  responsiveness, 
adaptability,  agility,  and  flexibility.  These  competencies  will  provide  a  source  of 
competitive  advantage  in  the  Information  Age  [3]). 
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This  paper  is  concerned  with  providing  the  user  with  a  way  to  easily  find  information, 
specifically  weather  products  within  the  NCES  ECB  (Net-Centric  Enterprise  Services, 
Early  Capabilities  Baseline)  SOA  (Service  Oriented  Architecture).  The  work  outlined  in 
this  paper  pertains  to  searching  and  ranking  weather  products  from  the  EVIS 
(Environmental  Visualization)  portlet.  EVIS  is  a  weather  effects  product  generation  and 
retrieval  tool.  It  allows  users  to  make  custom  maps  with  overlays  of  weather  effects  on 
various  military  operations.  These  maps  have  user  selectable  locations,  rules, 
dissemination,  and  the  ability  to  add  routes.  Eor  example,  a  person  planning  a 
coordinated  attack  with  aircraft  and  personnel  on  the  ground  can  make  a  product  that 
shows  potential  weather  impacts  to  both  these  types  of  units.  This  product  could  then  be 
saved  to  the  server  and  shown  to  anyone  involved  via  a  secure  network.  EVIS  is  part  of  a 
larger  entity  called  the  NCES  ECB  SOA.  In  the  NCES  portal  there  are  a  number  of  data 
providers  other  than  EVIS  that  create  and  publish  many  different  types  of  information. 
This  includes  video  surveillance,  intelligence  reports,  biographies,  etc.  With  all  these 
providers  it  is  important  to  have  well  designed  search  functionalities  so  a  user  can  find 
the  products  that  pertain  to  his  mission  quickly  and  without  much  effort. 
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2  Motivations 

EVIS  is  a  content  provider  within  the  NCES  ECB  web  portal.  In  the  NCES  ECB 
architecture  EVIS  serves  two  essential  duties.  One  is  to  provide  a  workflow  (called 
EMES,  Environmental  Mission  Effects  Services)  that  allows  users  to  create  weather 
effects  products.  These  products  are  useful  in  the  planning  of  military  operations.  The 
second  duty  is  to  present  these  weather  effects  products  to  end  users.  These  end  users  are 
expected  to  be  personnel  possessing  various  levels  of  knowledge  and  at  various  levels  on 
the  chain  of  command.  Some,  if  not  all,  of  these  people  will  be  very  busy.  Some  will 
even  have  people  researching  this  information  for  them.  It  is  very  important  that  they  are 
able  to  retrieve  the  appropriate  products  in  a  timely  manner. 

Eike  any  large  web  portal,  NCES  ECB  portal  has  its  own  specialized  search  functionality. 
This  search  capability  has  been  dubbed  lEIS  (Sometimes  called  EedSearch)  and  allows 
the  user  to  search  all  the  content  providers  in  the  portfolio  from  one  interface.  It  has 
some  standard  and  some  non-standard  search  features  such  as  the  ability  to  search  by 
date,  provider  or  geographic  location.  Unlike  traditional  search  engines  a  Eederated 
Search  Engine  does  not  scour  all  the  data  available  and  return  matches.  Instead,  it  sends 
SOAP  (xml)  messages  to  the  data  providers  and  requests  that  they  search  themselves. 
Then  the  providers  send  a  message  back  with  reference  to  the  relevant  and  matching 
documents.  These  messages  include  metadata  about  the  products  such  as  date 
information,  type  of  data,  etc.  but  they  do  not  include  the  products  themselves  [10].  This 
is  very  similar  to  a  library  journal  search.  At  first  glance  it  may  seem  like  a  backwards 
and  inefficient  way  of  doing  things.  However,  this  is  far  from  the  truth.  Searching  the 
public  internet  with  a  search  engine  makes  sense  because  most  of  the  documents  one  is 
searching  for  are  text-based,  html  web  pages.  Each  web  site  does  not  have  to  provide 
search  results  back  to  the  search  engine  for  each  query  a  user  makes.  This  approach 
would  be  highly  inefficient.  In  the  case  of  NCES  the  content  providers  are  serving  very 
specialized  types  of  information.  Some  examples  include  biographies,  time  sensitive 
reports,  trusted  intelligence,  and  video.  These  providers  are  expertly  familiar  with  the 
content  they  serve.  If  EedSearch  scoured  all  the  content  providers  itself  the  operation 
would  be  highly  inefficient.  Eor  instance,  each  time  a  content  provider  is  added 
EedSearch  would  need  to  be  modified.  In  addition,  the  content  providers  know  when 
new  content  is  added  to  their  databases.  They  can  add  this  content  to  their  indexes  or 
searches  with  minimal  effort.  EedSearch  would  have  to  periodically  catalog  all  available 
data  to  achieve  the  same  state.  The  Eederated  schema  is  highly  attractive  because  it 
allows  for  the  easy  addition  of  more  data  providers  with  only  a  minor  configuration 
change  to  EedSearch. 
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Figure  1.  Current  NCES  SOA  Data  Providers 

The  requirement  for  each  data  provider  to  search  themselves  and  calculate  relevance 
independently  is  both  a  benefit  and  a  potential  problem.  In  terms  of  structure  and 
content,  each  provider  knows  its  own  data  sources  and  data  stores  better  than  any  other 
entity.  This  knowledge  is  applied  towards  customizing  a  search  algorithm  tailored  to  the 
specific  data  of  each  provider.  In  addition  each  content  provider  is  required  to  return  a 
relevance  score  for  each  document  or  item.  This  relevance  score  is  useful  for  ordering 
products  when  they  are  presented  to  the  user  and  providing  an  easy  to  understand  metric 
for  each  product.  A  proper  ranking  increases  the  likelihood  that  a  user  will  find  the 
product(s)  he  is  looking  for  faster  and  with  less  work.  Since  each  data  provider  searches 
with  their  own,  independent  algorithm,  there  is  a  potential  problem  in  comparing 
relevance  scores  between  products  from  different  providers.  A  lot  of  thought  had  to  go 
into  selecting  and  refining  the  proper  relevance  algorithm  for  each  content  provider. 
Some  of  the  content  providers  in  the  NCES  portal  use  off  the  shelf  algorithms  and  others 
created  their  own  algorithms  (see  Figure  1  for  an  overview  of  the  data  providers). 
Various  search  and  ranking  algorithms  were  examined  for  use  in  EVIS.  Here  are  several 
general  algorithms  that  were  considered: 
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Simple  text  search  (TF-IDF)-  A  simple  text  search  takes  TF  (term  frequency)  and 
compares  that  in  some  way  with  IDF  (inverse  document  frequency).  In  other  words  it  is 
a  measure  of  how  frequently  a  word  appears  in  a  document  vs  how  popular  it  is  in 
multiple  documents.  This  type  of  algorithm  works,  but  not  well  with  a  system  such  as 
EVIS.  EVIS  products  can  be  for  one  day  or  many  days  which  would  lead  to  problems 
with  frequency  of  words  vs.  rarity  of  the  word.  Also,  EVIS  products  can  be  searched  by 
criteria  that  are  not  plain  text  such  as  time  and  lat/long  location.  [8] 

Eeaming  algorithms-An  algorithm  that  learns  or  adjusts  its  weights  has  the  possibility  of 
being  very  good  at  predicting  what  the  user  needs.  Development  of  this  type  of  algorithm 
was  not  possible  given  the  constraints  on  the  use  of  cookies  and  other,  similar 
technologies. 

Google  Pagerank-At  the  current  time  this  is  considered  the  gold  standard  in  search 
engines.  However,  its  ranking  system  is  based,  largely,  on  a  measure  of 
interconnectedness.  A  page  that  is  referenced  more  often  is  given  a  higher  ranking  [9]. 
This  type  of  measure  is  useless  for  EVIS  as  it  does  not  link  between  products.  Searching 
EVIS  is  more  akin  to  querying  a  database. 

3  Target  Users 

In  designing  the  algorithm  it  was  important  to  consider  the  users  requesting  EVIS 
products.  In  the  commercial  world  software  packages  often  fail  because  the  user  base  is 
not  properly  assessed.  The  target  user  for  EVIS  is  at  its  widest  all  military  personnel  who 
plan  missions,  forecast  weather,  or  are  involved  in  executing  missions.  At  its  widest 
scope  “mission”  can  mean  anything  from  a  convoy  of  supplies  to  a  covert  SEAE  team 
operation.  In  addition,  users  may  have  varying  levels  of  technical  knowledge  and 
familiarity.  This  can  cause  problems  at  both  extremes.  Advanced  users  may  expect 
querying  to  work  in  a  fashion  similar  to  commercial  products.  Also,  the  novice  user  may 
become  frustrated  if  the  item  they  are  looking  for  does  not  appear  within  the  first  few 
hits.  This  information  is  somewhat  useful,  in  that  it  provides  motivation  for  creating  a 
good  search  and  relevance  algorithm.  More  useful  is  information  pertaining  to  how  each 
of  these  people  will  use  the  software  in  their  daily  routines. 

The  categories  of  searchable  information  were  selected  based  on  possible  queries  users 
would  submit  to  find  weather  products.  The  first  step  in  designing  the  algorithm  was  to 
identify  important  categories  of  information  in  the  weather  products  that  people  would 
use  to  query  them.  This  was  determined  through  interviews  and  interactions  with 
potential  users  as  well  as  what  was  available  in  the  lEIS  engine.  The  following  categories 
were  determined  through  interviews  and  interactions  with  potential  users:  time,  title, 
person,  location,  keyword,  effect,  and  mission. 

It  is  unclear  at  this  stage  what  type  of  query  will  be  the  most  common,  however  several 
use  scenarios  are  envisioned.  Here  are  some  examples: 
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Repeat  user:  A  user  responsible  for  getting  weather  effects  reports  and  adding  them  to 
briefings  on  a  particular  region  needs  to  access  the  same  or  similar  products  every  day  or 
so.  This  user  would  benefit  from  being  able  to  search  by  location  and  time. 

Mission  planner:  A  mission  is  being  planned  in  the  very  near  future  and  weather  effects 
would  be  useful  in  making  target  choices.  The  mission  planner  may  want  to  search  by 
location,  effect,  time,  and  keyword  (for  type  of  mission).  A  mission  planner  may  have 
people  working  under  him  who  find  these  products  and  use  them  in  briefings. 

Mission  executor:  A  mission  has  been  planned  and  a  user  knows  there  is  a  specific 
weather  effects  product  available  for  this  mission.  This  user  could  find  it  by  searching 
for  a  keyword  which  may  appear  in  the  title  or  summary. 

On  the  surface  searching  and  ranking  EVIS  products  seems  like  a  simple  text  query.  All 
EVIS  products  contain  text  and  for  the  most  part  users  are  inputting  text  (They  can  also 
specify  date,  location,  etc.).  Such  a  search  service  would  be  somewhat  functional  and 
would  provide  users  with  good  results  some  of  the  time.  However,  the  goal  is  to  provide 
the  user  with  useful  and  relevant  results  consistently.  To  do  this  a  different  search 
paradigm  was  necessary. 

4  Search  Approach 

EVIS  weather  effects  products  are  much  like  records  in  a  database.  Each  one  containing 
multiple  entries  of  different  categories  (date,  location,  etc.)  and  differing  amounts  of 
entries  in  some  categories  (effects).  To  further  complicate  matters  EVIS  products  are  of 
varying  sizes,  which  leads  to  complications  if  a  simple  text  search  is  done. 

The  search  and  relevance  algorithm  created  for  EVIS  is  more  akin  to  a  database  search 
tool  than  anything  else.  Eike  a  database  search  tool  it  searches  in  bins  (or  categories)  for 
matches  to  the  user’s  query.  Certain  bins  carry  more  weight  in  a  search  than  others.  If 
for  instance  someone  is  searching  by  date  and  location,  which  is  more  important,  and  by 
how  much? 

5  The  Algorithm 

The  relevance  algorithm  is  technically  intertwined  with  the  search  algorithm.  This  choice 
was  made  mainly  due  to  efficiency.  Searching  for  products  and  assigning  relevance 
scores  on  the  fly  is  faster  than  sorting  through  a  list  of  matching  results.  The  search 
algorithm  defaults  to  a  simple  AND  (meaning  that  if  there  are  multiple  terms  and  no 
operators  in  a  query  they  must  all  be  present  for  a  document  to  be  counted  as  a  hit)  text 
search  of  all  the  fields  in  all  the  weather  products  that  are  currently  available.  An  AND 
search  was  selected  because  it  eliminates  the  possibility  of  getting  back  clutter  from  a 
specific  query.  EVIS  and  lEIS  understand  much  more  complicated  Boolean  keywords 
and  symbols  in  many  combinations.  Eor  instance  “(not  severe  or  moderate)  and 
helicopter”  is  a  legal  query.  As  the  scenarios  mentioned  above,  most  queries  to  this 
system  are  expected  to  be  rather  specific.  However,  due  to  the  fact  that  the  way  in  which 
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the  system  is  intended  to  be  used  and  the  actual  real  world  use  of  the  system  could  be 
different,  this  is  highly  configurable. 

Relevance  is  determined  based  on  several  categories  of  information.  These  categories 
correspond,  almost  one  to  one,  with  the  categories  query  terms  are  expected  to  come 
from.  These  categories  also  correspond  nicely  with  the  data  as  it  is  organized  in  each 
weather  product.  This  makes  for  a  relatively  pain  free  implementation.  The  categories 
included  in  the  algorithm  are  term,  term_count,  keyword,  state_count,  time,  location, 
title,  summary,  creator,  and  not.  Each  of  these  makes  up  a  portion  of  the  total  relevance 
score  that  a  particular  product  receives.  The  portioning  of  this  score  is  configurable  so 
each  category  can  be  given  as  much  weight,  as  needed,  in  determining  the  final  relevance 
score.  This  allows  for  tweaking  as  EVIS  is  put  to  use.  Currently  the  configuration  is  set 
so  that  the  category  term  has  the  greatest  sway  on  the  relevance  score. 

5. 1  Search  categories 

Eist  of  categories  and  how  they  are  calculated: 

term  -  matches  of  query  word  in  fields  of  a  product.  Eields  include  mission,  evolution, 

operation,  and  parameter. 

term_count  -  number  of  times  the  term  matches 

keyword  -  gets  points  if  the  query  term  matched  a  keyword.  Keywords  are  specific  to  a 
data  provider,  for  instance  “weather.” 

state_count  -  the  number  of  matches  for  “severe”,  “marginal”,  or  “acceptable”  if 
specified. 

time  -  if  the  query  time  constraint  matches  the  product.  If  the  constraint  is  "before"  or 
"after",  the  value  is  a  function  of  how  close  the  product  is  to  the  constraint.  If  "contains", 
then  the  max  amount  of  points  are  given. 

location-if  the  product  contains  the  query  point  or  intersects  with  the  query  box.  This 

value  is  Boolean.  At  this  time  Eederated  Search  does  not  provide  a  location  lookup  for 

plain  text  location  names  (gazetteer).  In  the  future  it  may,  but  it  is  expected  that  it  will 

return  the  same  data  type  for  location  to  content  providers. 

title  -  do  terms  from  the  query  match  terms  from  the  product  tile. 

summary  -  do  terms  from  the  query  match  terms  from  the  product  summary. 

creator  -  if  a  query  by  creator  is  done,  is  there  a  match. 

not  -  if  the  user  specifies  NOT  in  his  query  add  score  for  term  not  being  in  the  product. 

Eor  some  of  the  categories  a  simple  text  search  is  done  based  on  the  number  of  chances  to 
match  vs.  the  number  of  matches.  There  is  also  the  possibility  of  partial  matches  with 
this  algorithm.  These  categories  are  summary,  title,  and  term  matches.  Other  categories 
are  scored  as  stated  above.  Generally  these  are  either  Boolean  or  number  values.  Scores 
are  temporarily  stored  for  each  of  the  results.  After  this  is  done  the  scores  for  each 
category  are  normalized  against  the  highest  score  for  that  category.  This  provides  a  list 
of  scores  that  make  sense  to  the  end  user. 


5.  2  Current  algorithm  weights: 
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Term:  15 
Term  Count:  10 
Keyword:  5 
State  Count:  10 
Time:  15 
Location:  15 
Title:  10 
Summary:  10 
Creator:  5 
Normalize:  false 

These  weights  are  based  on  the  anticipated  needs  of  the  end  users.  As  those  needs 
change  so  can  the  weights.  Currently  the  weights  emphasize  time,  term  and  location. 

A  sample  search  for  these  query  terms  would  conclude  with  the  following  results  given 
these  listed  products: 

-Products  1  &  2  are  USMC  weather  webpages 

-Product  3  is  for  personnel  and  helicopter  operations  covering  all  of  Iraq. 

-Product  4  is  for  personnel  and  helicopter  operations  covering  the  east  coast  of  the  United 
States. 


Query  Terms 

Product  1 

Product  2 

Product  3 

Product  4 

Weather 

10 

10 

5 

5 

Weather  Iraq 
human 

0 

0 

40 

0 

Human 

0 

0 

25 

25 

Iraq 

0 

0 

10 

0 

Taken  alone  these  results  are  exactly  what  a  search  should  return.  When  querying  for 
“weather”  everything  from  EVIS  is  returned.  The  products  returned  have  low  scores 
because  there  is  a  low  likelihood  that  any  single  product  is  the  one  the  user  is  looking  for. 
However,  they  are  all  given  a  similar  chance  at  being  viewed.  In  the  second  query  from 
the  table  above  there  were  three  terms  “weather  -i-  Iraq  -i-  human”.  One  product  was 
returned  and  this  one  product  had  a  relatively  high  score.  This  is  an  acceptable  result. 
The  user’s  query  indicated  that  he  wanted  that  had  to  do  with  Iraq  or  had  a  human  effect, 
and  wanted  something  that  was  weather  related.  This  more  specific  query  returned  a 
higher  result  because  it  was  easier  to  compute  confidently  that  the  user  would  like  to 
view  the  product  returned. 


6  Federated  Search  Applet 

The  following  are  example  screen  shots  of  the  search  interface  applets  a  user  encounters 
in  the  NCES  ECB  environment.  The  interface  a  user  encounters  will  be  similar  to  those 
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shown  below,  but  not  exactly  the  same.  Many  pieces  of  the  interface  customize 
themselves  based  on  a  user’s  role  attributes  and  stored  preferences. 

6.1  Simple  Search  Interface: 


File  Options  Query 
Query  Creation  I 


Search 

Submit  Query: 


Timeframe: 


Unlimited 


Basic  Query 


GQ 


n  stored  Search  Show  Data  Sources  Personal  Search  Settings 


Viz  It! 


Figure  2.  Basic  Federate  Search  Interface 


This  is  the  most  basic  interface  through  which  users  interact  with  Federated  Search.  This 
interface  is  the  default  interface  exposed  when  a  user  selects  the  search  functionality  in 
the  portal.  It  has  basic  functionality  in  it  such  as  the  ability  to  use  personal  settings,  the 
ability  to  use  stored  search  criteria  and  a  timeframe  limitation  setting. 


6.  2  Advanced  search  interface 
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File  Options  Queiy 
[  Queiy  Creation  | _ 

^  Search 

Submit  Query: 


GO 


Viz  It! 


Advanced  Query 


□  Stored  Search  Show  Data  Sources  Personal  Search  Settinys 


with  the  exact  phrase 
with  at  least  one  of  the  words 
without  the  words 


Date: 


Max  Hits: 
Timeouts: 


From  1900  '*'1  -  |01  -  |1  n  To  2006  ■»  -  06  ■  6  ^ 


Max  Hits/Source; 
Source  Timeout  (sec  ): 


100 

▼ 

300 

Initial  Response  Timeout  (sec.): 


300 


Data  Source(s): 

Provider  List  Complete  (Cached) 


Data  Source  Descnpt 


Content  Staging 


Doni  Search 


DoS  Biographic  Reporting 


Doni  Search  ▼ 


DoS  Message  Traffic 


Don’t  Search 


EVIS  -  Theater  Weather  Effects 


Search 


generic  resource  search 


Doni  Search  ▼ 


JEQD  Improvised  Explosive  Device  Threat  Data  Provider 


Doni  Search 


■Ininf  IntBlIinpnrp  Center  Parifir  f.llCPAO  Mnssanes 


iDnnl  Search  I  ▼ 


iL 


111 


Figure  3.  Advanced  Search  Interface 

Should  a  user  wish  to  perform  a  more  specific  query  they  can  do  so  by  selecting 
“Advanced  Query”  from  the  drop  down  box  on  the  right  side  of  the  basic  search  screen. 
This  will  display  the  screen  shown  in  Figure  3.  In  addition  to  the  settings  available  in  the 
basic  search  interface  this  interface  offers  the  following:  specific  date  constraint 
selection,  configuring  the  output  parameters  of  a  search  (maximum  hits  gathered, 
timeouts,  etc.),  and  it  allows  for  the  selection/exclusion  of  specific  data  providers. 


6.3  Search  Results: 
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File  Options  Queiy 

1  Ftesults  1 

Federated  Search  Results 

weather  | 

(A)  Ftesults  O  Search  Decisions  O  Provider  Errors  and  Warnings  FHIts:  8  Status:  Search! 

SortBy:  (•)  Provider  O  File  Type  O  Date  i  Stop  Query  |  Close  QueryTab  |  Vizit!  | 

0  Show  Descriptions  Save  Query 

^  EVIS  -  Theater  Weather  Effects  (8  results) 

Theater  Weather  Effects  on  OPS;  Camp  Foo  August  17,  20...  EVIS  -  Theater  Weather  Effects  2005-08-1 6T1 
^  D*|  [relevance:  1 0] 

Test  Data  (Iteather  Effects  Forecast  for  Camp  Foo 

https://mirage.mdtoc.nrlmry.navy.mil/MetMF-Demo/MetMF-lmp3ct-Test-Data1.htm 

Theater  Weather  Effects  on  OPS:  Fort  Bar  February  22,  20...  EVIS -Theater  Weather  Effects  2005-02-21T2 
og  [relevance:  10] 

Test  Data  jiteather:  Effects  Forecast  for  Fort  Bar 

https://mifage.metoc.nflmry.navy.mil/MetMF-Demo/MetMF-lmpact-Test-Data2.htm 

Theater  Weather  Effects  on  OPS:  16:20  UTC  Thu  09  Mar  2006  EViS  -  Theater  Weather  Effects  2006-03-09T1 

w]  [relevance:  5] 

This  product  provides  visualizations  of  the  operational  effects  of  the  ^Mather  between  Mar  9,  2006  and 

Mar  9,  2006  in  region  LAT(55&deg;  12'  44.06"  N,  ...[more] 

https://mifage.metoc.nrlmry.navy.mil/evis/pfoduct.jsp?LoginlD=BALLAS.JAMES.A.1231398968&RequestlD=2006_03_09_16_21_'40_805_7608 

Theater  Weather  Effects  on  OPS:  13:47  UTC  Wed  08  Mar  2...  EVIS -Theater  Weather  Effects  2006-03-08T1 

^  og  [relevance:  5] 

This  product  provides  visualizations  of  the  operational  effects  of  the  weather  between  Mar  8,  2006  and 

Mar  8,  2006  in  region  LAT(90&deg;  00'  00.00"  N,  ...  [more] 

https://mifage.metoc.nflmfy.navy.mil/evis/pfoduct.jsp?LcginlD=USUserAnalystTS.Test.3010070083&RequestlD=2006_03_08_13_50_18_010_8203 

Theater  Weather  Effects  on  OPS:  13:42  UTC  Wed  08  Mar  2...  EViS- Theater  Weather  Effects  2006-03-08T1 

^  ^  [relevance:  5] 

[71  1  1  m 

^ ^ u 

A  .  ....... 

Figure  4.  Search  Results  Display 


Upon  the  successful  completion  of  a  query  the  user  is  presented  with  a  screen  similar  to 
that  shown  in  Figure  4.  In  this  interface  users  are  able  to  sort  results  by  provider,  file 
type,  and  date  as  well  choose  to  show  descriptions  or  not.  Each  result  item  has  the 
following  information  associated  with  it:  A  title,  provider  name,  date.  Clicking  on  a 
product  title  will  bring  the  user  to  the  selected  product. 


6.4  Products  Searchable  via  the  GISA  Tool 
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1^  ^  Internet 


'Erzurum 


^zmir 


tthinai 


Adana 


Aleppo' 


Tehran. 


"Nicosia 

-^Tripoli 


Esfahan 


Tel.  Aviv -YafoAmman^ 
andria  ^(t^erusa 


;airo! 


■fDoha 


Al.  Madinah 


Aswan 


Active  Laver 

I  Search  1  ^1 


iination  <'iud  Sitxintioiinl  Awareness 


>_nf  pb=true&_pageLabel=P3600 1 


VALUE  1 

Title 

Theater  Weather  Effects  on  OPS;  South  Eaq 

Desciptio 

This  product  provides  visualizations  of  the  operational  effects  of  the  weather  between  Aug  23, 
2005  and  Aug  23.  2005  in  region  LAT(31“  52‘  30.00"  N,  28®  35'  37.50"  N)  LON(43°  00' 
08.02"  E.  48®  36'  52.92"  E) 

[StartDate 

j^dDate 

Access 

httos  ://mira2e  .meto  c .  nrknrv.  navy,  nuyevis/oro  duct,  isp  ? 

LomnID=USUserAnalvstS.Test.3010070085&ReauestE)=2005  08  22  15  36  15  359  6426 

Figure  5.  Geographical  Search  Interface 


The  GISA  (Geospatial  Information  Situational  Awareness)  portlet  show  in  Figure  5  is  a 
tool  that  allows  users  to  search  for  products  via  geographic  location.  It  uses  the  IFIS 
engine  to  return  a  list  of  product  which  can  be  narrowed  down  using  the  interactive  map 
or  a  list  similar  to  the  one  in  the  Federated  Search  client. 


7  Cross-Provider  Relevance 

Federated  Search  simultaneously  returns  results  from  multiple  data  providers,  each 
providing  different  types  of  data  using  their  own  relevance  algorithms.  This  becomes  a 
problem  if  one  provider  writes  its  algorithm  so  that  it  always  scores  low  (or  always  scores 
high)  relative  to  the  other  providers.  When  its  products  are  meshed  with  the  products  of 
other  providers  it  may  always  appear  low  on  the  list  and  never  see  any  traffic  from  users. 
This  is  the  biggest  problem  in  returning  cross-provider  relevance.  A  quick  solution  to 
this  problem  is  normalizing  the  scores  returned  by  each  provider.  This,  however,  is  also 
problematic.  For  instance,  if  a  user  queries  FedSearch  for  products  pertaining  to  Iraq 
weather  they  should  receive  all  EVIS  products  with  Iraq  in  the  title  or  summary.  They 
may  also  receive  products  that  mention  weather  and  Iraq  in  passing,  perhaps  only  a  few 
from  a  single  provider.  The  first  product  from  this  provider  would  be  rated  at  or  close  to 
100%  relevant  even  though  it  might  not  be  that  relevant.  And  the  last  product  returned 
by  this  provider  might  be  rated  at  a  floor  of  1%  relevant,  where  it  should  probably  be 
somewhere  at  30%  or  40%. 
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Figure  6  is  a  typical  output  from  the  search  query  “weather”: 


Format  Relevancej,  CLS 


i 

DP 


O[1001 


[S] 


'itIe/Abstract 


DoS  Telegraph  (06ISLAMABAD429h  PAKISTAN  •  EARTHQUAKE:  USAID/DART  ASSESSMENT  TRIP  TO  MUZAFFARABAD 

I  Highlighted  I 

...effects  of  the  severe  winter  weather  during  the  previous  days.  Affected. ..remarkably  well  through  the  severe  weather.  That  affected  populations 
managed  as.. .effect  of  the  severe  winter  weather  in  previous  days.  Per  Ref.. .the  first  wave  of  winter  weather,  including  heavy  rains  and 
snowfall.. .due  to  the  severe  winterweather  that  occurred  on  January  1  and... 

http;//ncddosi.  dCl-us.com/ncddos/cable/mrn_06iSLAMABAD429.html 


[DoS 
Message 
Traffic  I 


2006-01- 

13T07:27:OO.hOOOO] 


e 

DP 


O[1001 


DoS  Telegraph  (06ISLAMABAD4371:  PAKISTAN  •  EARTHQUAKE:  USAID/DART  ASSESSMENT  TRIP  TO  MUZAFFARABAD 

[Highlighted! 

...effects  of  the  severe  winter  weather  during  the  previous  days.  Affected. ..remarkably  well  through  the  severe  weather.  That  affected  populations  [doS  2006-01- 

managed  as.. .effect  of  the  severe  winter  weather  in  previous  days.  Per  Ref.. .the  first  wave  of  winter  weather,  including  heavy  rains  and  Trafficf^  i3to9:08:oo+oooo] 

snowfall.. .due  to  the  severe  winterweather  that  occurred  on  January  1  and... 

http;//ncddosi.  dCi-us.com/ncddos/cable/mrn_06iSLAMABAD437.html 


DP 


O[09g] 


[U] 


(U)  nkor.pdf 

77  Air  Defense  Support  to  Offensive  Operations .  imagery 

https;//hfdevportal2.spawar.navv.mil/hfPortal/portlets/kmlnceauthorl  ng/fedsearch.jsp?redlrectURL=https%3A%2F%2Fhyperhfl.  us. hyperwave. eo  and  Video 

m%3A443%2FNGIC%2520Imagery%2520and%2520Video%2520Products%2FStufff%2FNGIC_Products%2FNudear%2520Weapons%2Fnuke%2Fguide%2Fdprk%2Fnkor.pdfproducts  | 


2004-01- 
23T01  ;09:46Z] 


e 

DP 


O[097] 


[S] 


South  Korea:  Weather  Supercomputer  Urgently  Needed 

for  the  supercomputer  being  used  for  **weather**  forecasting  that  is  located  in  the  Computerization  Technology  Center  |  Research  Institute].  Local 
•‘weather**  forecasting  is  expected  to  suffer.  As  a  result,  a  strong  demand  is  being  made  to  purchase  a  dedicated  supercomputer  for  quickly 
analyzing  "weather**  data 

https;//paChfinder.  us.  hyperwave. com;8443/docServer/ReCnever?dbname=  /exporC/hori2ontal_fusion/databases/FBIS_1998&recnum=241285&Qfile=H 
F_1143236201444_23254342.rmp 


[  Pathfinder 
Data  -  Iraq 
Releated 
Messages  | 


1998-09-16] 


Nigeria:  Nigeria-Meteorological  Equipinent  Installed  at  Six  Airports 

Article  by  Leo  Collins:  "Six  Airports  Get  Automatic  "Weather**  Observing  Equipment" 

<  country,  the  across  stations  its  of  six  in  (AWOS)  System  Observing  **Weather’*  Automatic  |> 

[  Pathfinder 

Qjgggj  [S]  Also  |  expeflmental  farm  at  the  Central  **Weather** ,  Forecasting  Station,  Oshodi,  also  in  Lagos.  peieate'^  i99a-05-28] 

Messages  | 

According 


a 

DP 


O[089] 


httpsi/Zpathfinder.  US.  hyperwave. com;8443/docServer/Retriever?dbname=  /export/hori2ontal_fusion/databases/FBIS_19988crecnum=1504248iqfile=H 
F_1143236201444_23254342.tmp 

South  Korea:  ROK,  DPRK  Exchange  Notes  on  Possible  Information  Trade 

on  a  possible  swap  of  **weather**  -related  information,  the  meteorological  administration  said  thursday. 

The  administration  sent  the  telegram,  in  the  name  of  kimpo  airport's  "weather**  office  chief  | ,  stressed  the  urgency  of  exchanging  **weather** 
information  for  various  purposes,  including  safety  of  flights 


https;//pathfinder.us.hyperwave.com;8443/docServer/Retriever?dbname=  /export/horlzontal_fusion/databases/FBIS_19988irecnum=62958&qflle=HF 
_1 14323620 1444_23254342.tmp 


[  Pathfinder 
Data  -  Iraq 
Releated 
Messages  | 


1998-04-25] 


Figure  6.  An  Example  Search  Output 


As  you  can  see  a  lot  of  the  products  are  not  weather  specific.  EVIS  products  do  not  even 
show  up  in  the  first  page  of  hits. 

Currently  work  is  being  done  to  remedy  this  situation.  Cross  provider  relevance  is  not  an 
unattainable  goal.  A  solution  is  in  the  work  that  looks  at  secondary  characteristics  and 
context  to  determine  the  proper  cross  provider  relevance  scores.  These  characteristics 
include  the  tendencies  of  each  provider’s  algorithm,  the  content/syntax  of  the  query,  and 
other  related  attributes  [6] .  The  hope  is  that,  in  the  future  it  will  be  easy  to  pull  data  from 
various  sources  and  present  them  to  the  user  in  an  easy  to  digest  form. 

8  Beyond 

There  are  several  important  points  to  take  away  from  this  development  process.  The  first 
is  that  the  user  is  paramount  in  making  design  decisions.  In  many  situations  it  can  be  said 
that  the  user  is  what  makes  or  breaks  an  endeavor.  The  difference  between  a  good 
product  and  not  so  good  product  is  the  way  the  user  perceives  it.  In  this  case  the  user  is 
military  personnel  who  need  accurate  and  timely  weather  effects  reports.  This  meant 
tailoring  the  search  algorithm  to  their  needs.  Specifically  this  meant  weighing  the 
categories  in  favor  of  ones  they  are  expected  to  search  more  often. 

The  second  lesson  learned  is  that  off  the  shelf  solutions  are  not  always  the  simplest  or 
best  for  a  particular  problem.  This  problem  required  something  specific  and  simple  to 
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solve.  A  custom  algorithm  was  ultimately  not  overly  complicated  to  devise  and 
implement.  In  addition,  this  algorithm  specifically  answers  the  needs  of  its  target  users  in 
a  way  that  others  might  not  be  able  to. 

The  NCES  ECB  SOA  continues  to  grow  in  three  directions.  Eirst  it  is  adding  more  data 
providers.  This  is  happening  slowly  but  is  made  possible  by  the  frameworks  created 
early  in  the  process.  Second  is  data.  Each  provider  is  serving  more  and  more  data.  This 
is  especially  true  as  NCES  systems  go  operational  and  are  being  used  by  military 
personnel  on  the  SIPRnet.  This  is  the  third  area  of  growth;  users.  Many  of  the  data 
providers  get  their  data  and  products  from  users.  With  more  users  there  is  more  data  and 
with  more  data  there  is  more  potential  for  confusion. 
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